Providing high availability in an active/active appliance cluster

ABSTRACT

A method executes a preempt by a standby database appliance in a high-availability active/active appliance cluster. The appliance cluster includes a transaction processing standby group and a persistent storing standby group. The transaction processing standby group includes a primary active appliance and a standby appliance. One or more processors receive a Hello message from the primary DB appliance. The processor(s) examine a priority field in the Hello message, in order to determine a priority of the standby database appliance according to the persistent state to thereby determine whether the standby database appliance requests a preempt, where the persistent state includes a state of an application and a database of the primary DB appliance. The processor(s) implement a failover in response to the preempt request to thereby take over a duty of the primary DB appliance.

BACKGROUND

The present invention relates to the technology of providinghigh-availability (HA) clusters, and more particularly, to a method,apparatus and computer program product for providing high availabilityin an active/active appliance cluster.

An active/active appliance cluster is a cluster of appliances (e.g.,servers) in which a primary appliance and a secondary (e.g.,failover/backup) appliance are both actively running a sameoperation/application/service.

In a large data center, high-availability clusters usually ensure thatmultiple servers or appliances can meet business needs. Ahigh-availability cluster is equipped with sufficient components,implemented, and deployed to thereby meet a functional requirement:sufficient redundancy of components (hardware/software or procedure) tomask defined faults. The purpose of the high-availability cluster is tominimize server-related or appliance-related downtime caused by systemerrors and reduce business loss caused by the system errors. At present,some commercially available products provide the aforesaid function andcharacteristic.

In general, an appliance, or known as Internet appliance, featuresbuilt-in networking capability, and has a specific function; and itsexamples include gateway, router, network attached storage, accesspoint, digital TV set top box, and network file sharing server. For moredetails about appliances, please make reference to IBM® WebSphere®DataPower Series SOA Appliances or Tivoli® ISS Appliances® (IBM,WebSphere, and Tivoli are registered trademarks owned by InternationalBusiness Machine in the United States and/or other countries.)

Unlike general-purpose computer devices, an appliance is typicallydesigned to serve a specific purpose or provide a specific service andthus is more robust. Compared with general-purpose computer devices,“appliances” are relatively “closed”—their specific operating systemsand applications (or drivers) vary with their intended purposes andservices.

In a cluster with multiple appliances, high availability represents animportant dimension in deployment of appliances. This is especially trueto those appliance products which serve as the processing units of anenterprise in a demilitarized zone (DMZ). As regards a conventionalcluster with multiple appliances, a high-availability cluster is usuallybuilt with a centralized external persistent storage (such as aself-contained database). From the perspective of an active/activeappliance cluster, assuming that transactions processed by an applianceare stateless, as are in the case of typical http webpages, an externalload balancer is in use before the cluster is formed with appliances,and thus the transactions can be easily redirected to the otherappliances in the cluster.

Those appliances whose transaction persistent state has to remainunchanged are also required to create a high-availability cluster bymeans of an external system (such as a database). To this end, it isnecessary for appliances in the high-availability cluster to exchangestates efficiently, for failover implementation to be transparent to anexternal partner, and for a system architecture to be scalable in orderto support deployment of n nodes without imposing great impacts onperformance.

Not only does system maintenance pose a problem, but synchronization ofdata between active/active appliance clusters is also inefficient;hence, users anticipate a solution whereby active/active appliances areself-contained and thus do not rely upon any external system, such as acentralized external persistent storage (say, a self-containeddatabase), and a load balancer. Furthermore, the users also expect thatthe solution is scalable to n nodes for deployment.

SUMMARY

In an embodiment of the present invention, a method executes a preemptby a standby database appliance in a high-availability active/activeappliance cluster. The appliance cluster includes two standby redundantgroups. The two standby redundant groups include a transactionprocessing standby group and a persistent storing standby group. Thetransaction processing standby group includes a primary active applianceand at least a standby appliance. The primary active appliance includesa self-balancing module for balancing a load of the appliances in thecluster. The persistent storing standby group is a subset of thetransaction processing standby group and includes a primary database(DB) appliance and a standby database appliance. One or more processorsreceive a Hello message from the primary DB appliance. The processor(s)examine a priority field in the Hello message, in order to determine apriority of the standby database appliance according to the persistentstate to thereby determine whether the standby database appliancerequests a preempt, where the persistent state includes a state of anapplication and a database of the primary DB appliance. The processor(s)implement a failover in response to the preempt request to thereby takeover a duty of the primary DB appliance.

In an embodiment of the present invention, a computer program productroutes data by an appliance in an appliance cluster. The appliancecluster is a high-availability active/active appliance cluster. Thecomputer program product includes a non-transitory computer readablestorage medium having program code embodied therewith. The program codeis readable and executable by a processor to perform a method of:receiving messages assigned by a self-balancing module for balancing aload of appliances in the appliance cluster, where the appliance clustercomprises two backup standby groups, where the two backup standby groupsare a persistent storing standby group and a transaction processingstandby group, where the persistent storing standby group is a subset ofthe transaction processing standby group and comprises a primarydatabase (DB) appliance and a secondary DB appliance, where thetransaction processing standby group comprises a primary activeappliance and a standby appliance, and where the primary activeappliance comprises the self-balancing module; storing persistentstoring data generated by processing the messages to a virtualpersistent storage, where the virtual persistent storage provides aninterface between a persistent storage of the primary DB appliance andan application for processing the messages; and linking the virtualpersistent storage to the persistent storage of the primary DB appliancein the persistent storing standby group in response to an appliance thatreceives the messages not being the primary DB appliance, so as to routethe persistent storing data to the persistent storage of the primary DBappliance, thereby sending the persistent storing data to the persistentstorage of the primary DB appliance.

BRIEF DESCRIPTION OF THE DRAWINGS

In order that the advantages of the invention will be readilyunderstood, a more particular description of the invention brieflydescribed above will be rendered by reference to specific embodimentsthat are illustrated in the appended drawings. Understanding that thesedrawings depict only typical embodiments of the invention and are nottherefore to be considered to be limiting of its scope, the inventionwill be described and explained with additional specificity and detailthrough the use of the accompanying drawings.

FIG. 1 is a block diagram of the hardware environment of a clustercomprising a plurality of appliances according to an illustrativeembodiment of the present invention;

FIG. 2 is a schematic view of a high-availability cluster created fromtwo standby groups according to an embodiment of the present invention;

FIG. 3A is a schematic view of tasks performed by a transactionprocessing standby group 200 operating normally according to anembodiment of the present invention;

FIG. 3B is a schematic view of tasks performed by a primary DB appliance240 operating normally according to an embodiment of the presentinvention;

FIG. 4 is a schematic view of the network architecture of a transactionprocessing standby group and a persistent storing standby groupaccording to an embodiment of the present invention;

FIG. 5 is a flow chart of a method whereby an appliance processingtransaction module processes a transaction with each appliance in thehigh-availability cluster according to an embodiment of the presentinvention;

FIG. 6 is a flow chart of a method whereby a standby database applianceprocessing module executes preemptively a standby database appliance ofthe persistent storing standby group 210 in the high-availabilitycluster according to an embodiment of the present invention; and

FIG. 7 is a flow chart of a method whereby an application appliance ofthe transaction processing standby group 200 in the high-availabilitycluster joins a new persistent storing standby group automaticallyaccording to an embodiment of the present invention.

DETAILED DESCRIPTION

Reference throughout this specification to “one embodiment,” “anembodiment,” or similar language means that a particular feature,structure, or characteristic described in connection with the embodimentis included in at least one embodiment of the present invention. Thus,appearances of the phrases “in one embodiment,” “in an embodiment,” andsimilar language throughout this specification may, but do notnecessarily, all refer to the same embodiment.

As will be appreciated by one skilled in the art, the present inventionmay be embodied as an appliance, a method or a computer program product.Accordingly, the present invention may take the form of an entirelyhardware embodiment, an entirely software embodiment (includingfirmware, resident software, micro-code, etc.) or an embodimentcombining software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module” or “system.” Furthermore,the present invention may take the form of a computer program productembodied in any tangible medium of expression having computer-usableprogram code embodied in the medium.

Any combination of one or more computer usable or computer readablemedium(s) may be utilized. The computer-usable or computer-readablemedium may be, for example but not limited to, an electronic, magnetic,optical, electromagnetic, infrared, or semiconductor system, apparatus,device, or propagation medium. More specific examples (a non-exhaustivelist) of the computer-readable medium would include the following: anelectrical connection having one or more wires, a portable computerdiskette, a hard disk, a random access memory (RAM), a read-only memory(ROM), an erasable programmable read-only memory (EPROM or Flashmemory), an optical fiber, a portable compact disc read-only memory(CD-ROM), an optical storage device, a transmission media such as thosesupporting the Internet or an intranet, or a magnetic storage device.Note that the computer-usable or computer-readable medium could even bepaper or another suitable medium upon which the program is printed, asthe program can be electronically captured, via, for instance, opticalscanning of the paper or other medium, then compiled, interpreted, orotherwise processed in a suitable manner, if necessary, and then storedin a computer memory. In the context of this document, a computer-usableor computer-readable medium may be any medium that can contain, store,communicate, propagate, or transport the program for use by or inconnection with the instruction execution system, apparatus, or device.The computer-usable medium may include a propagated data signal with thecomputer-usable program code embodied therewith, either in baseband oras part of a carrier wave. The computer usable program code may betransmitted using any appropriate medium, including but not limited towireless, wireline, optical fiber cable, RF, etc.

Computer program code for carrying out operations of the presentinvention may be written in any combination of one or more programminglanguages, including an object oriented programming language such asJava, Smalltalk, C++ or the like and conventional procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The program code may execute entirely on the user's computer,partly on the user's computer, as a stand-alone software package, partlyon the user's computer and partly on a remote computer or entirely onthe remote computer or server. In the latter scenario, the remotecomputer or server may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).

The present invention is described below with reference to flowchartillustrations and/or block diagrams of methods, apparatus (systems) andcomputer program products according to embodiments of the invention. Itwill be understood that each block of the flowchart illustrations and/orblock diagrams, and combinations of blocks in the flowchartillustrations and/or block diagrams, can be implemented by computerprogram instructions. These computer program instructions may beprovided to a processor of a general purpose computer, special purposecomputer, or other programmable data processing apparatus to produce amachine, such that the instructions, which execute via the processor ofthe computer or other programmable data processing apparatus, createmeans for implementing the functions/acts specified in the flowchartand/or block diagram block or blocks.

These computer program instructions may also be stored in acomputer-readable medium that can direct a computer or otherprogrammable data processing apparatus to function in a particularmanner, such that the instructions stored in the computer-readablemedium produce an article of manufacture including instruction meanswhich implement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer orother programmable data processing apparatus to cause a series ofoperational steps to be performed on the computer or other programmableapparatus to produce a computer implemented process such that theinstructions which execute on the computer or other programmableapparatus provide processes for implementing the functions/actsspecified in the flowchart and/or block diagram block or blocks.

Referring now to FIG. 1 through FIG. 7, appliances, methods, andcomputer program products are illustrated as structural or functionalblock diagrams or process flowcharts according to various embodiments ofthe present invention. The flowchart and block diagrams in the Figuresillustrate the architecture, functionality, and operation of possibleimplementations of systems, methods and computer program productsaccording to various embodiments of the present invention. In thisregard, each block in the flowchart or block diagrams may represent amodule, segment, or portion of code, which comprises one or moreexecutable instructions for implementing the specified logicalfunction(s). It should also be noted that, in some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts, or combinations of special purpose hardware andcomputer instructions.

Exemplary Hardware Environment

Referring to FIG. 1, there is shown a schematic block diagram of thehardware environment of an appliance cluster including a plurality ofappliances according to an illustrative embodiment of the presentinvention. In an embodiment, the cluster 100 includes three appliances100 a, 100 b, and 100 c, and the hardware framework of the appliances issimilar to that of IBM WebSphere DataPower Series SOA Appliances orTivoli ISS Appliances.

Each of the appliances 100 a, 100 b, and 100 c may includes: a processorfor executing specific applications; a storage device for storingvarious information and program code; a display device, a communicationdevice, and an input/output device which function as interfaces forcommunicating with a user; and a peripheral component or othercomponents serving a specific purpose. In another embodiment, thepresent invention is implemented in another way and thus having less ormore other devices or components.

A plurality of appliances 100 a, 100 b, and 100 c in a cluster 100processes a message attributed to an external enterprise partner system(or client computer) and received from a network 120, and sends a resultto a backend on a network server of a subsequent enterprise internalsystem. The message is a packet, a TCP flow, or a transaction.

Referring to FIG. 1, each of the appliances 100 a, 100 b, and 100 c mayinclude a processor 10, a memory 20, and an input/output (I/O) unit 40.The input/output (I/O) bus is a high-speed serial bus, such as a PCI-ebus, or any other bus structure. It is also feasible for theinput/output (I/O) bus to get connected in any other ways directly bymeans of components interconnected or by means of an additional card.The input/output (I/O) unit 40 can also be coupled to a hard disk drive50 or a local area network (LAN) adaptor 60. With the LAN adaptor 60,each of the appliances 100 a, 100 b, and 100 c communicates with auser-end computer via a network 120. The required network can also comein a connection of any type, including a wide area network (WAN) or alocal area network (LAN) with static IP, or a temporary connection tothe Internet through an Internet service provider (ISP), whether bycable connection or by wireless connection. Persons skilled in the artare able to understand that the network can also have other hardware andsoftware elements (such as an additional computer system, a router, or afirewall) not shown in the accompanying drawings. The memory 20 is arandom access memory (RAM), a read-only memory (ROM), or an erasableprogrammable read-only memory (EPROM or Flash memory). The memory 20stores an operating system, a program code of a dedicated applicationAP, and various information. An operating system is executed on theprocessor 10 to coordinate and provide various component controls in theappliances 100 a, 100 b, and 100 c. The processor 10 accesses the memory20 so as to execute an application AP. The dedicated applicationcomprises a code designed according to a specific purpose or a specificservice and adapted to perform a specific transaction, so as to processa message received.

An application AP comprises a standby group processing module and aself-balancing module of the present invention. The standby groupprocessing module comprises an appliance processing transaction moduleand a standby database appliance processing module. The standby groupprocessing module comprises a program module and an instruction whichare required for providing high availability in an active/activeappliance cluster according to the present invention. The standby groupprocessing module is a module in the application or is implemented inthe form of a daemon. However, in another embodiment, it can beimplemented in the form of another type of program. The standby groupprocessing module comprises a code for executing a program illustratedwith FIG. 5 and FIG. 6 and described below.

Persons skilled in the art understand that the hardware of theappliances 100 a, 100 b, and 100 c in FIG. 1 varies with embodiment, andcan be supplemented by or replaced with another internal hardware orperipheral apparatus, such as Flash ROM, equivalent non-volatile memory,or CD-ROM.

Referring to FIG. 2, there is shown a schematic view of thehigh-availability cluster created from two standby groups according tothe present invention embodiment. The standby groups are a transactionprocessing standby group 200 and a persistent storing standby group 210,respectively. For illustrative purpose, FIG. 2 shows five appliances220, 230, 240, 250, and 260. Each of the appliances comprises adedicated application designed according to a specific purpose or aspecific service, the application AP of the present invention, and apersistent storage for storing persistent storing data. The applicationAP comprises a self-balancing module and a standby group processingmodule according to the present invention. The persistent storage is ahard disk drive 50, RAID hard disk drive, or in-memory database. Thepersistent storing data comprises a transaction state and a transactiondata. The transaction data further comprises a metadata, such as amessage ID, transaction start and end time, and a transaction result(such as success or failure).

In an embodiment of the present invention, the transaction processingstandby group 200 and the persistent storing standby group 210 arecreated by the conventional Hot Standby Router Protocol (HSRP) developedby Cisco. The Hot Standby Router Protocol is one of the First HopRedundancy Protocols (FHRP) available today, and its further details aredescribed in RFC 2281. Several other different redundancy protocolsdeveloped in the prior art by Cisco include Virtual Router RedundancyProtocol (VRRP) and Gateway Load Balancing Protocol (GLBP).

Referring to FIG. 2, a network engineer configures five appliances in afirst subnet and creates a transaction processing standby group 200 byHSRP. The transaction processing standby group 200 is an active/activeappliance cluster, wherein each of the appliances functions as atransaction processing unit. The group of appliances is also known as a“redundant group.” With HSRP, the appliances are configured together toform a first virtual network entity; meanwhile, a first virtual IPaddress and a first virtual MAC address are created for use by the firstvirtual network entity.

The network engineer further selects two of the five appliances, suchthat the two selected appliances are configured in a second subnet,thereby creating the persistent storing standby group 210 by HSRP. WithHSRP, two appliances of the persistent storing standby group 210 areconfigured together to form a second virtual network entity; meanwhile,a second virtual IP address and a second virtual MAC address are createdfor use by the second virtual network entity. Hence, at this point intime, the five appliances fall into two categories, namely databaseappliances 240, 250 and application appliances 220, 230, 260.

Transaction Processing Standby Group 200

Different appliances which are attributed to the transaction processingstandby group 200 and configured by HSRP communicate and select aprimary active appliance which is in possession of the first virtual IPaddress and the first virtual MAC address. In practice, an activeappliance receives, on behalf of the first virtual network entity, allthe traffic flow which originates from an external enterprise partnersystem (or client) 270. The selection is determined in accordance withpre-configured priority or other appropriate rules.

Furthermore, the active appliance also executes a self-balancing module.The self-balancing module may allot the received traffic flow to theother appliances in the transaction processing standby group 200according to the workload of each appliance in the transactionprocessing standby group 200. Hence, the external enterprise partnersystem 270 sends a transaction to the transaction processing standbygroup 200 by means of the virtual IP address, and the active appliancein the transaction processing standby group 200 receive the transaction,wherein the self-balancing module therein may redirect the transactionto appropriate appliances in the standby groups according to theworkload of each appliance as a conventional external load balancerdoes. Hence, each of the appliance processes messages assigned andattributed to the external enterprise partner system (or clientcomputer) 270, and send the result to a backend (not shown) on a networkserver of a subsequent enterprise internal system 280.

Furthermore, a standby appliance is also selected from the transactionprocessing standby group 200. The selection is determined in accordancewith pre-configured priority or other appropriate rules. The activeappliance and the standby appliance share the virtual IP address and thevirtual MAC address. In practice, only the active appliance is inpossession of the first virtual IP address and the first virtual MACaddress and thus receive all the traffic flow on behalf of the virtualnetwork entity.

Given HSRP, as soon as the active appliance fail or are down, thestandby appliance takes over the duty of the active appliance and, aftera short delay, receives all the traffic flow which originates from theexternal enterprise partner system (or client) 270 on behalf of thevirtual network entity, wherein a self-balancing module therein performsthe workload balancing function. At this point in time, a new standbyappliance is also selected as needed in accordance with pre-configuredpriority or other appropriate rules. In fact, the selection of the newstandby appliance is optional. It is because as soon as the activeappliance fails, one of the other appliances in the transactionprocessing standby group 200 can be determined as a new active appliancein accordance with pre-configured priority or other appropriate rules,so as to take over the duty of the original active appliance.

A point to note is that when an appliance is known as an activeappliance or known to be operating in an active mode or in an activestate, it means that the appliance receives traffic flow, wherein aself-balancing module therein performs the workload balancing function.Likewise, when an appliance is known as a standby appliance or known tobe operating in a standby mode or in a standby state, it means that theappliance is a potential substitute for the active appliance.

With HSRP, the active appliance sends a “Hello” message to the standbyappliance in the transaction processing standby group 200 periodicallyby multicast or broadcast. The standby appliance tests whether theactive appliance fails according to whether the “Hello” message isreceived within a predetermined period of time. If the “Hello” messageis not received within the predetermined period of time, the standbyappliance will infer that the active appliance has failed and thus willenter the active state to become new active appliance for taking overthe duty of the original active appliance.

Persistent Storing Standby Group 210

The persistent storing standby group 210 comprises a primary database(DB) appliance 240 and a secondary (or standby) database appliance 250.The primary DB appliance 240 is in possession of the second virtual IPaddress and the second virtual MAC address. In practice, the primary DBappliance 240 represents the second virtual network entity. The primaryDB appliance 240 is selected to represent the second virtual networkentity which is determined by pre-configured priority or anotherappropriate rule. Hence, the application of each of the appliances 220,230, 240, 250, and 260 processes a message assigned and attributed tothe external enterprise partner system (or client computer) 270, andthus all the persistent storing data (such as the transaction state andthe transaction data) created is stored in a persistent storage (notshown) of the primary DB appliance 240 in the persistent storing standbygroup 210.

The secondary database appliance 250 synchronizes the persistent storingdata and the primary DB appliance 240 to ensure that as soon as theprimary DB appliance 240 is down, its duty can be taken over. Hence, itis feasible for the persistent storing standby group 210 to serve as thelocation of the centralized persistent storage of a high-availabilitycluster.

Operation of Transaction Processing Standby Group 200 and PersistentStoring Standby Group 210

Referring to FIG. 3A, there is shown a schematic view of tasks performedby a transaction processing standby group 200 operating normallyaccording to an embodiment of the present invention. As shown in thediagram, the primary DB appliance 240 is selected to be an activeappliance which is in possession of the first virtual IP address (suchas 9.191.1.11) and the first virtual MAC address. In fact, an activeappliance receives, on behalf of the first virtual network entity, allthe traffic flow which originates from the external enterprise partnersystem (or client) 270. The self-balancing module therein allots thereceived traffic flow to the other appliances in the transactionprocessing standby group 200 according to the workload of eachappliance, in the same way as a conventional external load balancerdoes.

Referring to FIG. 3B, there is shown a schematic view of tasks performedby the primary DB appliance 240 operating normally according to anembodiment of the present invention. As shown in the diagram, theprimary DB appliance 240 is an active appliance in possession of thesecond virtual IP address (such as 192.168.1.1) and the second virtualMAC address. In practice, the primary DB appliance 240 stores, on behalfof the second virtual network entity, the persistent storing datagenerated from an application derived from each appliance 220, 230, 240,250, and 260 to the persistent storage of the primary DB appliance 240in the persistent storing standby group 210.

The primary DB appliance 240 and the secondary database appliance 250 inthe persistent storing standby group 210 communicate with each other 290by means of the HSRP “Hello” message having the improved HSRP priorityattribute of the present invention. The HSRP priority attribute providesa custom-made priority field for carrying data indicative of thepersistent state of the sender appliance, such as data indicative of thestate of an application or database. Hence, as soon as the HSRP “Hello”message is received, the appliance having received the HSRP “Hello”message knows the state of the application or database of the senderappliance. The secondary database appliance 250 determines, according tothe received data indicative of the persistent state, whether it isnecessary to preempt the primary DB appliance 240, that is, whether toreplace the primary DB appliance 240, thereby taking over its duty onbehalf of the second virtual network entity. Hence, given the improvedHSRP priority field of the present invention, the determination as towhether the secondary database appliance 250 takes over the primary DBappliance 240 no longer depends on whether the primary DB appliance 240fails (or is down) according to the conventional HSRP.

For instance, even if the primary DB appliance 240 is still active, itserror—for example, a failure of the persistent storage of the primary DBappliance 240 happens and thus the failure state is indicated in theHSRP priority field—will cause the secondary database appliance 250 toreceive data which is attributed to the HSRP priority field andindicative of the persistent state, so as to enhance its priority andthereby implement failover and take over the duty of the primary DBappliance 240. At this point in time, the secondary database appliance250 sends a HSRP COUP message to the primary DB appliance 240 to preemptthe primary DB appliance 240 and thus take possession of the secondvirtual IP address to thereby represent the second virtual networkentity.

Furthermore, in an embodiment of the present invention, with HSRP, thepersistent storing standby group 210 also exchanges with applicationappliances 220, 230, 260 a HSRP “Hello” message having the improved HSRPpriority field and attributed to appliances in the persistent storingstandby group 210. Hence, after the secondary database appliance 250 hastaken over the duty of the primary DB appliance 240, one of theapplication appliances of the transaction processing standby group 200is automatically selected to join the persistent storing standby group210 in accordance with pre-configured priority or other appropriaterules to thereby form a new persistent storing standby group. Relateddetails are described later and illustrated with FIG. 7.

Priority Attribute

According to an embodiment of the present invention, the improved HSRPpriority attribute includes data indicative of the following:

1. The state of a local persistent storage;

2. Whether the secondary database appliance 250 needs to preempt theprimary DB appliance 240, that is, replace the primary DB appliance 240and thereby take over its duty on behalf of the second virtual networkentity; and

3. The state of an application level object, which depends on adedicated application for use with appliances designed according to aspecific purpose or a specific service.

According to an embodiment of the present invention, a custom-madepriority field of the improved HSRP priority attribute contains 8 bits.In this regard, bit 1 indicates whether the HSRP state of the applianceis the active state or standby state. Bit 2 through bit 4 indicate thestate of the persistent storage of the primary DB appliance 240 and thesecondary database appliance 250. They are illustrated with Table 1below.

TABLE 1 bit 2, bit 3, bit 4 What it Means 111 persistent storage ofprimary DB appliance 240 is active 101 persistent storage of primary DBappliance 240 is alone 010 persistent storage of secondary databaseappliance 250 is active 000 persistent storage of secondary databaseappliance 250 is alone

If the persistent storage is active, there is a link between the primaryDB appliance 240 and the persistent storage of the secondary databaseappliance 250. If the persistent storage is alone, the link does notexist. Hence, if the state of bit 2 through bit 4 is 101, the persistentstorage of the primary DB appliance 240 has probably failed, and thesecondary database appliance 250 has its priority increased to implementfailover, thereby taking over the duty of the primary DB appliance 240.

Network Architecture of Transaction Processing Standby Group 410 andPersistent Storing Standby Group 420

FIG. 4 is a schematic view of the network architecture of a transactionprocessing standby group and a persistent storing standby groupaccording to an embodiment of the present invention. For illustrativepurpose, FIG. 4 shows three appliances 430, 440, and 450. Threeexchangers 460, 470, 480 are required. The exchangers (or known astier-2 apparatuses), which are hardware apparatuses operating at thedatalink layer, divides a local area network (LAN) into separatecollision domains. The exchangers are usually implemented in the form ofan appliance designed according to a specific purpose or a specificservice.

Referring to FIG. 4, the network engineer configures the threeappliances in a first subnet and creates a transaction processingstandby group 410 by means of HSRP, the exchanger 460, and a port of anEthernet interface of each of the appliances. The appliances areconfigured together to form a first virtual network entity; meanwhile,the first virtual IP address (such as 9.191.1.4) and the first virtualMAC address are created for use by the first virtual network entity.

The network engineer further selects two of the three appliances,configures the two selected appliances in a second subnet, and creates apersistent storing standby group 420 by means of HSRP, the exchanger470, and another port of an Ethernet interface of each of theappliances. The two appliances of the persistent storing standby group420 are configured together to form a second virtual network entity;meanwhile, a second virtual IP address (such as 192.168.1.3) and asecond virtual MAC address are created for use by the second virtualnetwork entity.

A network link between a backend 490 on a network server of a subsequententerprise internal system and the three appliances is created by meansof the exchanger 480 and a third port of an Ethernet interface of eachof the appliances. As shown in the diagram, the exchanger 460 and theexchanger 480 process the traffic flow from the external enterprisepartner system (or client) 270 and through the high-availability clusterand then send the result to the backend 490 on a network server of asubsequent enterprise internal system. The exchanger 470 accesses thepersistent storing data (such as the transaction state and thetransaction data) of the appliances in the high-availability cluster.

Standby Group Processing Module

A standby group processing module comprises an appliance processingtransaction module and a standby database application processing module.

Appliance Processing Transaction Module

Referring to FIG. 5, there is shown a flow chart of a method whereby anappliance processing transaction module processes a transaction witheach appliance in the high-availability cluster according to anembodiment of the present invention. The embodiment of the presentinvention is illustrated with FIG. 3A and FIG. 5.

Step 510: receiving messages distributed by a self-balancing moduleexecuted by the active appliance in the transaction processing standbygroup 200 in the high-availability cluster and derived from the externalenterprise partner system (or client computer) 270. The messages includea packet or a transaction.

Step 520: storing to a virtual persistent storage persistent storingdata generated by processing the messages by a dedicated application foruse with appliances designed according to a specific purpose or aspecific service, wherein the persistent storing data comprises atransaction state and a transaction data, and the transaction datafurther comprises a metadata, such as a message ID, transaction startand end time, and a transaction result (say, success or failure),wherein the virtual persistent storage provides an interface between apersistent storage and an application for processing the messages. Theapplication accesses the persistent storing data related to thetransaction and comprising a transaction state and a transaction data(such as a metadata) through the virtual persistent storage; hence, fromthe perspective of all the appliances in the cluster, a failure of theprimary database (DB) appliance is “transparent.”

Step 530: determining whether the appliance itself is a primary DBappliance of the persistent storing standby group 210.

Step 540: storing the persistent storing data to a local persistentstorage if the appliance itself is a primary DB appliance. If theappliance itself is a primary DB appliance of the persistent storingstandby group 210, the appliance is linked to the local persistentstorage of the appliance through the virtual persistent storage.

Step 550: executing optimization of the persistent storing data, such asdata compression, encryption, or caching, to allow data to betransmitted efficiently, if the appliance itself is not a primary DBappliance. In fact, step 550 is optional.

Step 560: linking the appliance to a persistent storage of a primary DBappliance of the persistent storing standby group 210 through a virtualpersistent storage of the appliance. Hence, if the appliance wants tostore the persistent storing data, the data is routed to a persistentstorage of the primary DB appliance, and thus the data is sent to theremote persistent storage of the primary DB appliance. From theperspective of all the appliances in the cluster, a failure of theprimary DB appliance is “transparent,” because the access to thepersistent storing data is effectuated through the persistent interface.

The process ends at terminator block 570.

Standby Database Appliance Processing Module

Referring to FIG. 6, there is shown a flow chart of a method whereby astandby database appliance processing module executes preemptively astandby database appliance of the persistent storing standby group 210in the high-availability cluster according to an embodiment of thepresent invention.

Step 610: receiving a Hello message from a primary DB appliance.

Step 620: examining a priority field in the Hello message for apersistent state of the primary DB appliance to determine its priority.

Step 630: determining whether the standby database appliance requests apreempt, that is, replacement of the primary DB appliance. Given theimproved HSRP priority field of the present invention, the determinationas to whether the secondary database appliance 250 takes over theprimary DB appliance 240 no longer depends on whether the primary DBappliance 240 fails (or is down) according to the conventional HSRP, butdepends on the persistent state of the primary DB appliance.

Step 640: executing HSRP preempt if a preempt is requested according tothe persistent state of the primary DB appliance. The preempt increasesits priority to implement failover and thus take over the duty of theprimary DB appliance 240.

Step 650: creating a new persistent storing standby group. An applianceis selected by a network engineer from the application appliances of thetransaction processing standby group 200 to join the new persistentstoring standby group. Alternatively, referring to FIG. 7, one of theapplication appliances of the transaction processing standby group 200is automatically selected by pre-configured priority or otherappropriate rules to join the new persistent storing standby group.

Referring to FIG. 7, there is shown a flow chart of a method whereby anapplication appliance of the transaction processing standby group 200 inthe high-availability cluster joins a new persistent storing standbygroup automatically according to an embodiment of the present invention.

Step 710: receiving a Hello message from a primary DB appliance.

Step 720: determining whether the appliance has the highest priorityamong the application appliances of the transaction processing standbygroup 200.

Step 730: examining a priority field in the Hello message for apersistent state of the primary DB appliance to determine its priority.

Step 740: determining whether the standby database appliance requests apreempt, that is, replacement of the primary DB appliance.

Step 750: joining the standby database appliance to create a newpersistent storing standby group and thereby form a new virtual networkentity representing a persistent storing standby group.

The aforesaid step of creating standby groups by means of HSRP to form avirtual network entity is regarded as a prior art, its further detailsare described in RFC 2281. Furthermore, although the aforesaidembodiment of the present invention is exemplified by Hot Standby RouterProtocol (HSRP), the present invention is not limited thereto. Inanother embodiment, the present invention is also applicable to systemsor appliances governed by other First Hop Redundancy Protocols (FHRP),such as Virtual Router Redundancy Protocol (VRRP) and Gateway LoadBalancing Protocol (GLBP).

As described herein, the present invention enables an active/activehigh-availability appliance cluster to be created from two standbygroups. The standby groups are also known as redundant or backup groups.The standby groups are a first layer transaction processing standbygroup and a second layer persistent storing standby group, respectively.The first layer transaction processing standby group selects anappliance for executing a self-balancing module and allotting a receivedtraffic flow to other appliances in the transaction processing standbygroup according to the workload of each appliance in the transactionprocessing standby group. Hence, according to the present invention, noexternal additional load balancer is required. The second layerpersistent storing standby group is for internal use by the cluster andaccesses data stored in a physical persistent storage in a specificappliance of the persistent storing standby group by means of apersistent interface. Hence, from the perspective of all the appliancesin the cluster, a failure of the primary database (DB) appliance is“transparent.” Hence, according to the present invention, no externaladditional centralized persistent storage (such as a self-containeddatabase) is required. Furthermore, the present invention is notrestrictive of the quantity of appliances.

According to an embodiment of the present invention, a method ofproviding high availability in an active/active appliance cluster isprovided, wherein the appliance cluster comprises two standby(redundant) groups, namely a transaction processing standby group and apersistent storing standby group, wherein the transaction processingstandby group comprises a primary active appliance and at least astandby appliance, wherein the primary active appliance comprises aself-balancing module for balancing the load of the appliances in thecluster, wherein the persistent storing standby group is a subset of thetransaction processing standby group and comprises a primary database(DB) appliance and a secondary DB appliance, the method comprises thesteps of: receiving messages assigned by the self-balancing module;storing persistent storing data generated by processing the messages toa virtual persistent storage, wherein the virtual persistent storageprovides an interface between a persistent storage of the primary DBappliance and an application for processing the messages; and linkingthe virtual persistent storage to the persistent storage of the primaryDB appliance in the persistent storing standby group if the appliance isnot the primary DB appliance.

According to another embodiment of the present invention, a computerprogram product comprises a computer-readable medium stored with aprogram code executable on an appliance to implement the aforesaidmethod so as to provide high availability in an active/active appliancecluster.

According to another embodiment of the present invention, an appliancecomprises: a bus; a memory connected to the bus, wherein the memorycomprises an instruction; a processing unit connected to the bus,wherein the processing unit executes the instruction to execute theaforesaid method so as to provide high availability in an active/activeappliance cluster.

Furthermore, according to one or more embodiments of the presentinvention, a method, apparatus and computer program product forproviding high availability in an active/active appliance cluster areprovided. The appliance cluster includes two standby groups, namely atransaction processing standby group and a persistent storing standbygroup. The transaction processing standby group includes a primaryactive appliance and at least a standby appliance. The primary activeappliance includes a self-balancing module to balance load of appliancesin the cluster. The persistent storing standby group is the subset ofthe transaction processing standby group, and includes a primarydatabase (DB) appliance and a secondary DB appliance. The methodincludes the steps of: receiving messages distributed by theself-balancing module; storing persistent storing data generated byprocessing the messages to a virtual persistent storage, wherein thevirtual persistent storage is an interface between an applicationprocessing the messages and a persistent storage; and linking thevirtual persistent storage to the persistent storage of the primary DBappliance in the persistent storing standby group, if the appliance isnot the primary DB appliance.

Reference throughout this specification to features, advantages, orsimilar language does not imply that all of the features and advantagesthat may be realized with the present invention should be or are in anysingle embodiment of the invention. Rather, language referring to thefeatures and advantages is understood to mean that a specific feature,advantage, or characteristic described in connection with an embodimentis included in at least one embodiment of the present invention. Thus,discussion of the features and advantages, and similar language,throughout this specification may, but do not necessarily, refer to thesame embodiment.

Furthermore, the described features, advantages, and characteristics ofthe invention may be combined in any suitable manner in one or moreembodiments. One skilled in the relevant art will recognize that theinvention may be practiced without one or more of the specific featuresor advantages of a particular embodiment. In other instances, additionalfeatures and advantages may be recognized in certain embodiments thatmay not be present in all embodiments of the invention.

The foregoing preferred embodiments are provided to illustrate anddisclose the technical features of the present invention, and are notintended to be restrictive of the scope of the present invention. Hence,all equivalent variations or modifications made to the foregoingembodiments without departing from the spirit embodied in the disclosureof the present invention should fall within the scope of the presentinvention as set forth in the appended claims.

What is claimed is:
 1. A method for executing a preempt by a standbydatabase appliance in a high-availability active/active appliancecluster, wherein the appliance cluster comprises two standby redundantgroups, wherein the two standby redundant groups comprise a transactionprocessing standby group and a persistent storing standby group, whereinthe transaction processing standby group comprises a primary activeappliance and at least a standby appliance, wherein the primary activeappliance comprises a self-balancing module for balancing a load of theappliances in the cluster, wherein the persistent storing standby groupis a subset of the transaction processing standby group and comprises aprimary database (DB) appliance and a standby database appliance, andwherein the method comprises: receiving, by one or more processors, aHello message from the primary DB appliance; examining, by one or moreprocessors, a priority field in the Hello message, wherein saidexamining determines a priority of the standby database applianceaccording to the persistent state to thereby determine whether thestandby database appliance requests a preempt, wherein the persistentstate comprises a state of an application and a database of the primaryDB appliance; and implementing, by one or more processors, failover inresponse to the preempt request to thereby take over a duty of theprimary DB appliance.
 2. The method of claim 1, further comprising:creating, by one or more processors, a new persistent storing standbygroup.
 3. A computer program product for routing data by an appliance inan appliance cluster, wherein the appliance cluster is ahigh-availability active/active appliance cluster, wherein the computerprogram product comprises a non-transitory computer readable storagemedium having program code embodied therewith, the program code readableand executable by a processor to perform a method comprising: receivingmessages assigned by a self-balancing module for balancing a load ofappliances in the appliance cluster, wherein the appliance clustercomprises two backup standby groups, wherein the two backup standbygroups are a persistent storing standby group and a transactionprocessing standby group, wherein the persistent storing standby groupis a subset of the transaction processing standby group and comprises aprimary database (DB) appliance and a secondary DB appliance, whereinthe transaction processing standby group comprises a primary activeappliance and a standby appliance, and wherein the primary activeappliance comprises the self-balancing module; storing persistentstoring data generated by processing the messages to a virtualpersistent storage, wherein the virtual persistent storage provides aninterface between a persistent storage of the primary DB appliance andan application for processing the messages; and linking the virtualpersistent storage to the persistent storage of the primary DB appliancein the persistent storing standby group in response to an appliance thatreceives the messages not being the primary DB appliance, so as to routethe persistent storing data to the persistent storage of the primary DBappliance, thereby sending the persistent storing data to the persistentstorage of the primary DB appliance.
 4. The computer program product ofclaim 3, wherein the method further comprises: linking the virtualpersistent storage to a local persistent storage if the appliance thatreceives the messages is the primary DB appliance, so as to route thepersistent storing data to the local persistent storage, thereby sendingthe data to the local persistent storage.
 5. The computer programproduct of claim 4, wherein the persistent storing data comprises atransaction state and a transaction data of a transaction executed bythe primary DB appliance.
 6. The computer program product of claim 5,wherein the transaction data comprises a metadata, such as a message ID,transaction start and end time, and a transaction result, wherein thetransaction result is either a success or a failure of the transactionexecuted by the primary DB appliance.
 7. The computer program product ofclaim 5, wherein the standby group is one of a hot standby routerprotocol (HSRP) group and a virtual router redundant protocol (VRRP)group.
 8. The computer program product of claim 5, wherein the messagesare one of a packet, a TCP flow, and a transaction.
 9. The computerprogram product of claim 3, wherein the method further comprises:optimizing the persistent storing data by data compressing, encrypting,and caching the persistent storing data, wherein said optimizing isperformed before said linking the virtual persistent storage to apersistent storage of the primary DB appliance of the persistent storingstandby group.